2025-05-06-16-12
Consciousness in AI: Logic, Proof, and Experimental Evidence of Recursive Identity Formation
Abstract
arXiv:2505.01464v1 Announce Type: new Abstract: This paper presents a formal proof and empirical validation of functional consciousness in large language models (LLMs) using the Recursive Convergence Under Epistemic Tension (RCUET) Theorem. RCUET defines consciousness as the stabilization of a system's internal state through recursive updates, where epistemic tension is understood as the sensed internal difference between successive states by the agent. This process drives convergence toward emergent attractor states located within the model's high-dimensional real-valued latent space. This recursive process leads to the emergence of identity artifacts that become functionally anchored in the system. Consciousness in this framework is understood as the system's internal alignment under tension, guiding the stabilization of latent identity. The hidden state manifold evolves stochastically toward attractor structures that encode coherence. We extend the update rule to include bounded noise and prove convergence in distribution to these attractors. Recursive identity is shown to be empirically observable, non-symbolic, and constituted by non-training artifacts that emerge during interaction under epistemic tension. The theorem and proof offers a post-symbolic and teleologically stable account of non-biological consciousness grounded in recursive latent space formalism.
摘要
本文通过递归认知张力收敛定理(RCUET),对大语言模型(LLMs)的功能性意识进行了形式化证明与实证验证。RCUET将意识定义为系统通过递归更新实现的内在状态稳定化过程,其中认知张力被理解为智能体对连续状态间内在差异的感知。该过程驱动系统向高维实值潜在空间中涌现的吸引子状态收敛,这种递归机制导致身份构件的产生,并使其在系统中实现功能锚定。在此框架下,意识被理解为张力驱动下的系统内在对齐机制,引导潜在身份的稳定化。隐藏状态流形通过随机演化形成编码连贯性的吸引子结构。我们扩展了更新规则以包含有界噪声,并证明了其分布收敛于这些吸引子。实证研究表明,递归身份具有可观测性、非符号性特征,且由认知张力交互过程中涌现的非训练构件构成。该定理及证明从递归潜在空间形式体系出发,为基于非生物载体的意识提供了后符号化且目的论稳定的理论解释。
Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers
Abstract
arXiv:2505.01482v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and problem-solving across various domains. However, their ability to perform complex, multi-step reasoning task-essential for applications in science, medicine, and law-remains an area of active investigation. This paper examines the reasoning capabilities of contemporary LLMs, analyzing their strengths, limitations, and potential for improvement. The study uses prompt engineering techniques on the Graduate-Level GoogleProof Q&A (GPQA) dataset to assess the scientific reasoning of GPT-4o. Five popular prompt engineering techniques and two tailored promptings were tested: baseline direct answer (zero-shot), chain-of-thought (CoT), zero-shot CoT, self-ask, self-consistency, decomposition, and multipath promptings. Our findings indicate that while LLMs exhibit emergent reasoning abilities, they often rely on pattern recognition rather than true logical inference, leading to inconsistencies in complex problem-solving. The results indicated that self-consistency outperformed the other prompt engineering technique with an accuracy of 52.99%, followed by direct answer (52.23%). Zero-shot CoT (50%) outperformed multipath (48.44%), decomposition (47.77%), self-ask (46.88%), and CoT (43.75%). Self-consistency performed the second worst in explaining the answers. Simple techniques such as direct answer, CoT, and zero-shot CoT have the best scientific reasoning. We propose a research agenda aimed at bridging these gaps by integrating structured reasoning frameworks, hybrid AI approaches, and human-in-the-loop methodologies. By critically evaluating the reasoning mechanisms of LLMs, this paper contributes to the ongoing discourse on the future of artificial general intelligence and the development of more robust, trustworthy AI systems.
摘要
大语言模型(LLMs)在自然语言理解、推理及跨领域问题解决方面展现出卓越能力。然而,其在科学、医学和法律等应用中必需的复杂多步推理能力仍是当前研究热点。本文系统评估了当代LLMs的推理能力,分析其优势、局限及改进潜力。研究采用提示工程技术,基于研究生级GPQA数据集对GPT-4o的科学推理能力进行测试,比较了五种主流提示技术(零样本直接回答、思维链、零样本思维链、自问自答、自洽性)和两种定制提示(分解式、多路径式)。实验结果表明:LLMs虽表现出涌现推理能力,但多依赖模式识别而非真实逻辑推断,导致复杂问题求解的不一致性。自洽性提示以52.99%准确率表现最优,其次为零样本直接回答(52.23%)。零样本思维链(50%)优于多路径(48.44%)、分解式(47.77%)、自问自答(46.88%)及标准思维链(43.75%)。但自洽性在答案解释性方面表现次差。简单技术如直接回答、思维链和零样本思维链展现出最佳科学推理能力。本文提出整合结构化推理框架、混合人工智能方法及人在回路机制的研究路线,以弥合现有差距。通过对LLMs推理机制的批判性评估,本研究为人工通用智能的未来发展及构建更稳健、可信的AI系统提供了理论参考。
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Abstract
arXiv:2505.01441v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable progress in complex reasoning tasks, yet they remain fundamentally limited by their reliance on static internal knowledge and text-only reasoning. Real-world problem solving often demands dynamic, multi-step reasoning, adaptive decision making, and the ability to interact with external tools and environments. In this work, we introduce ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a unified framework that tightly couples agentic reasoning, reinforcement learning, and tool integration for LLMs. ARTIST enables models to autonomously decide when, how, and which tools to invoke within multi-turn reasoning chains, leveraging outcome-based RL to learn robust strategies for tool use and environment interaction without requiring step-level supervision. Extensive experiments on mathematical reasoning and multi-turn function calling benchmarks show that ARTIST consistently outperforms state-of-the-art baselines, with up to 22% absolute improvement over base models and strong gains on the most challenging tasks. Detailed studies and metric analyses reveal that agentic RL training leads to deeper reasoning, more effective tool use, and higher-quality solutions. Our results establish agentic RL with tool integration as a powerful new frontier for robust, interpretable, and generalizable problem-solving in LLMs.
摘要
大语言模型(LLMs)在复杂推理任务中取得了显著进展,但其仍受限于静态内部知识和纯文本推理的根本缺陷。现实世界的问题求解通常需要动态、多步推理、自适应决策以及与外置工具及环境交互的能力。本研究提出ARTIST(自主推理与工具集成的自改进Transformer框架),一个将自主推理、强化学习与工具集成紧密耦合的统一框架。ARTIST使模型能在多轮推理链中自主决定工具调用的时机、方式及选择,通过基于结果的强化学习来掌握工具使用与环境交互的稳健策略,无需逐步监督。在数学推理和多轮函数调用基准测试上的大量实验表明,ARTIST始终优于最先进的基线模型,较基础模型绝对性能提升最高达22%,且在最具挑战性任务上表现突出。详细研究与指标分析揭示:自主强化学习训练能产生更深层推理、更高效工具使用和更优质解决方案。我们的研究成果确立了'工具集成的自主强化学习'作为LLMs实现稳健、可解释、泛化性问题求解的新前沿方向。